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Abstract — The ability to place objects in an environment is 
an important skill for a personal robot. An object should not 
only be placed stably, but should also be placed in its preferred 
location/orientation. For instance, it is preferred that a plate 
be inserted vertically into the slot of a dish-rack as compared 
to being placed horizontally in it. Unstructured environments 
such as homes have a large variety of object types as well as 
of placing areas. Therefore our algorithms should be able to 
handle placing new object types and new placing areas. These 
reasons make placing a challenging manipulation task. 

In this work, we propose using supervised learning ap- 
proach for finding good placements given the point clouds 
of the object and the placing area. It learns to combine the 
features that capture support, stability and preferred place- 
ments using a shared sparsity structure in the parameters. 
Even when neither the object nor the placing area is seen 
previously in the training set, our algorithm predicts good 
placements. In extensive experiments, our method enables the 
robot to stably place several new objects in several new placing 
areas with a 98% success-rate, and it placed the objects in 
their preferred placements in 92% of the cases. 

I. Introduction 

In several manipulation tasks of interest, such as arrang- 
ing a disorganized kitchen, loading a dishwasher or laying 
a dinner table, a robot needs to pick up and place objects. 
While grasping has attracted great attention in previous 
works, placing remains under-explored. To place objects 
successfully, a robot needs to figure out where and in what 
orientation to place them — even in cases when the objects 
and the placing areas may have not been seen before. 

Given a designated placing area (e.g., a dish-rack), this 
work focuses on finding good placements (which includes 
the location and the orientation) for an object. An object 
can be placed stably in several different ways — for example, 
a plate could be placed horizontally on a table or placed 
vertically in the slots of a dish-rack, or even be side- 
supported when other objects are present (see Fig. [T]). A 
martini glass should be placed upright on a table but upside 
down on a stemware holder. In addition to stability, some 
objects also have 'preferred' placing configuration that can 
be learned from prior experience. For example, long thin 
objects (e.g., pens, pencils, spoons) are placed horizontally 
on a table, but vertically in a pen- or cutlery-holder. Plates 
and other 'flat' objects are placed horizontally on a table, 
but plates are vertically inserted into the 'slots' of a dish- 
rack. Thus there are certain common features depending 
on the shape of objects and placing areas that indicate 
their preferred placements. These reasons make the space 
of potential placing configurations of common objects in 




Fig. 1: How to place an object depends on the shape of the object 
and the placing environment. For example, a plate could be placed 
vertically in the dish rack (left), or placed slanted against a support 
(right). Furthermore, objects can also have a 'preferred' placing 
configuration. E.g., in the dish rack above, the preferred placement 
of a plate is vertically into the rack's slot and not horizontally in 
the rack. 



indoor environments very large. The situation is further 
exacerbated when the robot has not seen the object or the 
placing area before. 

In this work, we compute a number of features that 
indicate stability and supports, and rely on supervised learn- 
ing techniques to learn a functional mapping from these 
features to good placements. We learn the parameters of our 
model by maximizing the margin between the positive and 
the negative class (similar to SVMs 1 1 ]). However we note 
that while some features remain consistent across different 
objects and placing areas, there are also features that are 
specific to particular objects and placing areas. We therefore 
impose a shared sparsity structure in the parameters while 
learning. 

For training our model, we obtain ground-truth labeled 
data using rigid-body simulation. During robotic experi- 
ments, we first use a pre-constructed database of 3D models 
of objects to recover the point cloud, and then evaluate the 
potential placements using the learned model for obtaining 
a ranking. We next validate the top few placements in a 
rigid-body simulator (which is computationally expensive), 
and perform the placements using our robotic arm. 

We test our algorithm extensively in the tasks of placing 
several objects in different placing environments. (See 
Fig.|2]for some examples.) The scenarios range from simple 
placing of objects on flat surfaces to narrowly inserting 
plates into dish-rack slots to hanging martini glasses upside 



Fig. 2: Examples of a few objects (plate, martini glass, bowl) and placing areas (rack-1, rack-3, stemware holder). For the full list, 



see Table III 



down on a holding bar. We perform our experiments with a 
robotic arm that has no tactile feedback (which makes good 
predictions of placements important). Given an object's 
grasping point and the placing area, our method enables 
our robot to stably place previously unseen objects in 
new placing areas with 98% success-rate. Furthermore, the 
objects were placed in their preferred configurations in 98% 
of the cases when the object and placing areas were seen 
before, and in 92% of the cases when the objects and the 
placing areas were new. 

The contributions of this work are as follows: 

• While some prior work studies finding 'flat' surfaces 
0, we believe that this is the first work that considers 
placing new objects in complex placing areas. 

• Our learning algorithm captures features that indicate 
stability of the placement as well as the 'preferred' 
placing configuration for an object. 



II. Related Work 

There is little previous work in placing objects, and it 
is restricted to placing objects upright on 'flat' surfaces. 
Edsinger and Kemp [3 ] considered placing objects on a flat 
shelf. The robot first detected the edge of the shelf, and then 
explored the area with its arm to confirm the location of the 
shelf. It used passive compliance and force control to gently 
and stably place the object on the shelf. This work indicates 
that even for flat surfaces, unless the robot knows the 
placing strategy very precisely, it takes good tactile sensing 
and adjustment to implement placing without knocking 
the object down. Schuster et al. El recently developed a 
learning algorithm to detect clutter-free 'flat' areas where 
an object can be placed. While these works assumes that 
the given object is already in its upright orientation, some 
other related works consider how to find the upright or the 
current orientation of the objects, e.g. Fu et al. [4 ] proposed 
several geometric features to learn the upright orientation 
from an object's 3D model and Saxena et al. Q predicted 
the orientation of an object given its image. Our work is 
different and complementary to these studies: we generalize 
placing environment from flat surfaces to more complex 
ones, and desired configurations are extended from upright 
to all other possible orientations that can make the best use 
of the placing area. 



Planning and rule-based approaches have been used in 
the past to move objects around. For example, Lozano- 
Perez et al. considered picking up and placing objects 
by decoupling the planning problem into parts, and tested 
on grasping objects and placing them on a table. Sugie et 
al. (7) used rule-based planning in order to push objects on 
a table surface. However, these approaches assume known 
full 3D model of the objects, consider only flat surfaces, 
and do not model preferred placements. 

In a related manipulation task, grasping, learning al- 
gorithms have been shown to be promising. Saxena et 
al. (HO [10) used supervised learning with images and 
partial point clouds as inputs to infer good grasps. Later, 
Le et al. ifTTTl and Jiang et al. lH2l proposed new learning 
algorithms for grasping. Rao et al. [ 1 3 1 used point cloud 
to facilitate segmentation as well as learning grasps. Li 
et al. lfT4l combined object detection and grasping for 
higher accuracy in grasping point detection. In some grasp- 
ing works that assume known full 3D models (such as 
Grasplt fT5ll ), different 3D locations/orientations of the 
hand are evaluated for their grasp scores. Berenson et 
al. |[T6ll consider grasping planning in complex scenes. 
Goldfeder ifTTl recently discussed a data-driven grasping 
approach. In this paper, we consider learning placements 
of objects in cluttered placing areas — which is a different 
problem because our learning algorithm has to consider 
the object as well as the environment to evaluate good 
placements. To the best of our knowledge, we have not seen 
any work about learning to place in complex situations. 

III. System Overview 

As outlined in Fig. |4| the core part of our system is the 
placement classifier with supervised learning (yellow box 
in Fig. [4]), which we will describe in details in section |TV-B 
In this section, we briefly describe the other parts. 

A. Perception 

We use a stereo camera to perceive objects and placing 
areas, however, it can capture the point cloud only partially 
due to the shiny /textureless surfaces or occlusions. To 
recover the entire geometry, a database of parameterized 
objects with a variety of shapes is created beforehand. A 
scanned partial point-cloud is registered against the objects 
in the database using the Iterative Closest Point (ICP) 
algorithm (18] [T9l . The best matching object from the 
database is used to represent the completed geometry of 
the scanned object. While this 3D recovery scheme is not 



Fig. 3: Some snapshots from our rigid-body simulator showing different objects placed in different placing areas. (Placing areas from 
left: rack-1, rack-2, rack-3, flat surface, pen holder, stemware holder, hook, hook and pen holder. Objects from left: mug, martini 
glass, plate, bowl, spoon, martini glass, candy cane, disc and tuning fork.) 




Fig. 4: System Overview: The core part is the placement 
classifier (yellow box) which is trained using supervised 
learning to identify a few candidates of placement configu- 
rations based on the perceived geometries of environments 
and objects. Those candidates are validated by a rigid body 
simulation to determine the best feasible placement which 
is then fed to the robot controller. 



compulsory, with our particular stereo camera we have 
found that it significantly improves the performance. In this 
work, our goal is to study placement, therefore we simplify 
perception by assuming a good initial guess of the object's 
initial location. 

B. Simulation 

In our pipeline, rigid-body simulation is used for two 
purposes: (a) to generate training data for our supervised 
learning algorithm, and (b) to verify the predicted place- 
ments suggested by the classifier (See Fig. [3] for some 
simulated placing tasks). 

A placement defines the location T and orientation R 
of the object in the environment. Its motion is computed 
by the rigid-body simulation at discrete time steps. At each 
time step, we compute the kinetic energy change of the 
object AE = E n — E n -\. The simulation runs until the 
kinetic energy is almost constant (AE < 5), in which a 
stable state of the object can be assumed. Let T s and R s 
denote the stable state. We label the given placement as 
a valid one if the final state is close enough to the initial 
state, i.e. \\T S - T \\ 2 2 + \\R S - R g < 5 8 . 

This simulation is computationally expensive. Thanks 
to our classifier, we only need to perform the rigid-body 
simulation on a few suggested placements, thus making it 
a much more efficient process as compared to checking all 
random placements. 

Since the simulation itself has no knowledge of placing 
preferences, when creating the ground- truth training data, 
we manually labeled all the stable (as verified by the sim- 
ulation) but un-pref erred placements as negative examples. 



C. Realization 

To realize a placement decision, our robotic arm grasps 
the object, moves it to the placing location with desired 
orientation, and releases it. Our main goal in this paper 
is to learn how to place objects well, therefore in our 
experiments the kinematically feasible grasping points are 
pre-calculated. The location and geometry of the object and 
the environment is computed from real point cloud, using 



the registration algorithm in section III-A The robot then 
uses inverse kinematics to figure out the arm configuration 
and plans a path to take the object to the predicted placing 
configuration (including location and orientation), and then 
releases it for placing. 



IV. Learning Approach 



A. Features 



In this section, we describe the features used in our 
learning algorithm. Given the point cloud of both object 
and placing area, we first randomly sample some points 
in the bounding box of the environment as the placement 
candidates. For each candidate, the features are computed 
to reflect the possibilities to place the object there. In 
particular, we design the features to capture the following 
two properties: 

• Supports and Stability. The object should be able to 
stay still after placing, and even better to be able to 
stand small perturbations. 

• Preferred placements. Placements should also follow 
common practice. For example, the plates should be 
inserted into a dish-rack vertically and glasses should 
be placed upside down on a stemware holder. 

In the following description, we use to denote the set 
of object points, p Q is the 3D coordinate of a point o on 
the object, and x t denotes the coordinate of a point t in 
the placing area point cloud. 

Supporting Contacts: We propose the features to reflect 
the support of the object from the placing environment. In 
particular, given an object represented by n points and a 
possible placement, we first compute the vertical distance, 
Ci, i=l...n, between each object point and placing area 



(Fig. [5a]), then the minimum k distances are quantified 
into three features: 1) minimum distance mm i= i. q, 
2) maximum distance max^i. ^Q, and 3) the variance 

\ E[Li( c * - c) 2 , where c = \ £* =1 q. 
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(a) supporting contacts 



(b) caging (side view) (c) caging (top view) 

Fig. 5: Illustration of Features in 2D. 



(d) signatures of geometry 



Caging: When the object is placed stably, not only is it 
supported by vertical contacts but also it may lean against 
other local part of the environment and be "caged" by 
the gravity and the surrounding environment. Caging also 
ensures robustness to perturbations. For instance, consider 
a pen placed upright in a holder. While it has only a 
few vertical supporting contacts and may move because 
of a perturbation, it will still remain in the holder because 
of caging. To capture local caging shape, we divide the 
space around the object into 3x3x3 zones. The whole 
divided space is the axis-aligned bounding box of the object 
scaled by 1.6, and the center zone is 1.05 times of the 



bounding box (Fig. [5b] and [5c]). The point cloud of the 
placing area is partitioned into these zones labelled by 
^ijk,hj,k = 1, 2, 3, where i indexes the vertical direction 
ei, and j and k index the other two orthogonal directions, 
€2 and es, on horizontal plane. 



From the top view, there are 9 regions (Fig. [5b]), each of 
which covers three zones in the vertical direction. For each 
region, the height of the highest point in vertical direction 
is computed. This leads to 9 features. In addition, we use 
the horizontal distance between environment and object to 
capture the possible side support. In particular, for each 
i = 1,2,3, we compute 



da 



mm e^iPo 



x t ) 



(1) 



x t e^ inu^f ii2U^ u 
p c eo 

d i2 min -e\(jp Q - x t ) 

p c eo 

d i3 = min ej (p Q - x t ) 

p c eo 

d i4 min ~^{Po ~ x t ) 

X t e^H3U^i23^i33 

p c eo 

and produce 12 additional features. 

Signatures of Geometry: A placement strategy in general 
depends on the geometric shapes of both the object and 
the environment. To abstract the geometries, we propose 
the signatures of point-cloud objects and environments and 
use them as features in our learning algorithm. 

To compute the signatures of the object, we first compute 
the spherical coordinates of all points with the origin at 
the placing point p (See Fig. 5d). Let (pi,0i,^i) denote 



points by their inclination and azimuth angles. Given a 
spherical region Q a &, a = ... 3, 6 = 0...7, a point i 
is in 0,^ when it satisfies 45a < 0i < 45 (a + 1) and 
456 < fa < 45(6 + 1). This partition leads to 32 regions. 
We count the number of points in each region as a feature, 
creating 32 more features in total. 

For the signatures of the environment, we conduct similar 
process, but consider only the local points of the environ- 
ment around the placing point p. Let 

Pmax max pj . 
object point i 

Only the environment points whose distance to p is less 
than 1. 5 pmax is partitioned into the aforementioned 32 
regions. This produces 32 more features. 

To capture the matching between environment and object 
geometries, we first compute two values for each of the 32 
regions tt ab : 



tab = min pi 

enviroment point i £ Q a b 

pi<1.5pmax 

c ab = max pi 

object point i £ fl a b 



(2) 



the spherical coordinate of point i. We then partition these 



and compute c a b/t a b as a feature. Note that if there is no 
object or environment point in some region, we simply set a 
fixed number (—1 in practice) for the corresponding feature. 

In total, we generate 120 features: 3 features for support- 
ing contacts, 21 features for caging, and 96 of them for the 
signatures of geometry. They are concatenated into a vector 
v G M 120 , and used in the learning algorithm described in 
the following sections. 

B. Learning Algorithm 

We frame the manipulation task of placing as a super- 
vised learning problem. Given the features computed from 
the point-clouds of the object and the environment, the 
goal of the learning algorithm is to find a good placing 
hypothesis. 

If we look at the objects and their placements in the 
environment, we notice that there is an intrinsic difference 
between different placing settings. For example, it seems 
unrealistic to assume placing dishes into a rack and hanging 
martini glasses upside down share exactly the same hypoth- 
esis, although they might agree on a subset of attributes. 
I.e., while some attributes may be shared across different 



objects and placing settings, there are some attributes that 
are specific to the particular setting. In such a scenario, it 
is not sufficient to have either one single model or several 
completely independent models for each placing setting. 
The latter also tends to suffer from over-fitting easily. 
Therefore, in this work we propose to use shared sparsity 
structure in our learning model. 

Say, we have M objects and N placing areas, thus 
making a total of r = MN placing settings. We call each 
of these settings a 'task'. Each task can have its own model 
but intuitively they should share parameters underneath. To 
quantify this constraint, we use the idea from a recent work 
of Jalali et al. l20l that proposes to use a shared sparsity 
structure for multiple linear regression. We extend their 
model to classic soft-margin SVM (T). (For specific details 
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on experimental details and results, see Section |V-B| and 
Section |V-Dl) 

In detail, for r tasks, let Xi e R pxn * and Yi denote 
training data and its corresponding label, where p is the 
size of the feature set and ni is the number of data points 
in task i. If we treat r tasks independently, we would get 
the following goal function based on classic SVM, 

ELi (llkll' + ^E-ii^) 

subject to Y?(ljTx{ + &*) > 1 - > 

VI < i < r, 1 < ) < n- (3) 

where uji G M p is the learned model for zth task. C is the 
trade off between the margin and training error, and is 
the slack variable. 

Now, we modify the objective function above. We model 
each uji as composed of two parts uji = Si + Bi : the term 
Si represents the self-owned features and Bi represents the 
shared features. All Si's should only have a few non-zero 
values so that it can reflect individual difference to some ex- 
tent but would not become dominant in the final model. As 
for Bi, they need not have identical value, but should share 
similar sparsity structure across tasks. I.e., for each feature, 
they should be either all very active or non-active mostly. 
Let ||S|| M = EijlSjl and \\B\\ hoo = £* =1 max, \B> \. 
Our new goal function is now: 

mm ELi(jlkll2 + cE;ii^ 

A5||fl r || 1|1 +A B ||B|| 1>00 
subject to YP {uoJXl + hi) > 1 - £ id , £ id > 

VI < i < r, 1 < j < rii (4) 

This function contains two penalty terms for S and B each, 
with hand-tuned coefficients A $ and A^. Because ||/S f ||- L -j_ is 
defined as the sum of absolute values of elements in each 
Si, it can effectively control the magnitude of S without 
interfering with the internal structure of S. For sharing, 
11^ 111 oo encourages all Bi to simultaneously assign large 
weight (can be either positive or negative) to the same set 
of features. This is because no additional penalty is added 
for increasing B\ if it is not already the maximum one. 



This modification indeed results in a superior performance 
with new objects in new placing areas. 

We transform this optimization problem into a stan- 
dard quadratic programming (QP) problem by introducing 
auxiliary variables to substitute for the absolute and the 
maximum value terms. Unlike Equation [3] which decom- 
poses into r sub-problems, the optimization problem in 
Equation]?] becomes larger, and hence takes a lot of compu- 
tational time to learn the parameters. However, inference is 
still fast since predicting the score is simply the dot product 
of the features with the learned parameters. During the test, 
if the task is in the training set, then its corresponding 
model is used to compute the scores. Otherwise we use a 
voting system, in which we average of the scores from all 
the models in the training data to predict score for the new 



situation (see Section V-B for different training settings). 



V. Experiments 

A. Robot Hardware and Setup 

We use a Adept Viper s850 arm with six degrees of 
freedom, equipped with a parallel plate gripper that can 
open to up to a maximum width of 5cm. The arm has 
a reach of 105cm, together with our gripper. The arm 
plus gripper has a repeatability of about 1mm in XYZ 
positioning, but there is no force or tactile feedback in 
our arm. We use a Bumblebe^] camera to obtain the point 
clouds. 

B. Learning Scenarios 

In real- world placing, the robot may or may not have seen 
the placing locations (and the objects) before. Therefore, we 
train our algorithm for four different scenarios: 

1) Same Environment Same Object (SESO), 

2) Same Environment New Object (SENO), 

3) New Environment Same Object (NESO), 

4) New Environment New Object (NENO). 

If the environment is 'same', then only this environ- 
ment is included in the training data, otherwise all other 
environments are included except the one for test. This is 
done similarly for the objects. Through these four scenarios, 
we would be able to observe the algorithm's performance 
thoroughly. 

We also compare our algorithm with the following three 
heuristic methods: 

1) Chance. This method randomly samples a "collision- 
free" location (from the bounding box of the environ- 
ment) and an orientation for placing. 

2) Flat surface upright rule. Several methods exist for 
finding 'flat' surfaces Q, and we consider a placing 
method based on finding flat surfaces. In this method, 
objects would be placed with pre-defined upright 
orientation on the surface when flat surfaces exist in 
a placing area such as a table or a pen holder. When 
no flat surface can be found, such as for racks or 

1 http : // w w w.ptgrey. com/products/ stereo .asp 



TABLE I: Average performance of the algorithm when used 
with different types of features. Tested on SESO scenario with 
independent SVMs. 
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all 
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13.3 
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TABLE II: Average performance for different training methods: 
joint SVM, independent SVM with voting and shared sparsity 
SVM with voting for NENO scenario. 
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stemware holder, this method would pick placements 
randomly. 

Finding lowest placing point. For many placing ar- 
eas, such as dish-racks or containers, a lower placing 



point (see Section IV-A) often gives more stability. 
Therefore, this heuristic rule chooses the placing 
point with the lowest height. 

C. Evaluation Metrics 

We evaluate our algorithm's performance on the follow- 
ing metrics: 

• Ro: Rank of the first valid placement. (Ro = 1 in the 
best case) 

• Precision@n\ In top n candidates, the fraction of valid 
placements. Specifically, we choose n = 5 in our 
evaluations. (0 < Pre@n < 1.) 

• ^stability- Success-rate (in %) of robotic placement in 
placing the object stably, i.e., the object does not move 
much after placing. 

• ^preference- Success-rate (in %) of robotic placements 
while counting even stable placements as incorrect if 
they are not the 'preferred' ones. 

Except Ro, the other three metrics represent precision and 
thus higher values indicate higher performance. 

D. Learning Experiments 

For evaluating our learning algorithm, we considered 7 
environments and 8 objects (shown in Fig. [3]). In detail, we 
generated one dataset each for training and test for each 
setting (i.e., an object-environment pair). In each dataset, 
100 distinct 3D locations are paired with 18 different 
orientations which gives 1800 different placements. After 
eliminating placements that have collisions, we have 37655 
placements in total. 

Table [I] shows the average performance when we use 
different types of features: supporting contacts, caging 
and geometric signatures. While all the three types of 
features outperform chance, combining them together gives 
the highest Ro and Pre@5. Next, Table [TT] shows the 
comparison of three variations of SVM learning algorithms: 



1) joint SVM where one single model is learned from all the 
placing settings in the training dataset; 2) independent SVM 
that treats each setting as a learning task and learns separate 
model for every setting; 3) shared sparsity SVM that also 
learns one model per setting but with parameter sharing. 
Both independent and shared sparsity SVM use voting to 
rank placements for the test case. Table [TT] shows that in the 
hardest learning scenario, NENO, the shared sparsity SVM 
performs best. The result also indicates that independent 
SVM with voting is better than joint SVM. This could 
be due to the large variety in the placing situations in 
the training set. Thus imposing one model for all tasks 
decreases the performance. 

Table [TIT] shows the performance of the different algo- 



rithms described in Section V-B on various placing tasks. 



For each row in Table [TIIJ the numbers are averaged across 
the objects for each environment (when listed environment- 
wise) and are averaged across the environments for each 
object (when listed object- wise). 

There is a large variety in the objects as well as in 
the environments, leading to a large number of possible 
placements. Thus one can hardly find a heuristic that would 
find valid placements in all the cases. Not surprisingly, the 
chance method performs poorly (Prec@5=0) because there 
are very few preferred placements in the large sampling 
space of possible placements. The two heuristic methods 
perform well in some obvious cases, e.g., flat- surface- 
upright method works well for flat surfaces, and lowest- 
point method works reasonably in 'cage-like' environments 
such as a pen holder. However, their performance varies 
significantly in non-trivial cases. They perform poorly in 
many cases including the stemware holder and the hook. 

We get close to perfect results for SESO case — i.e., the 
learning algorithm can very reliably predict object place- 
ments if a known object was being placed in a previously 
seen location. The performance is still very high for SENO 
(i.e., even if the robot has not seen the object before), 
where the first correct prediction is ranked 1.8 on average. 
This means we only need to perform simulation twice. 
However, for NESO the performance starts to deteriorate — 
the average number of simulations needed is 4.8 because 
of poor performance in placing the martini glass on the flat 
surface and the stemware holder. 

The last learning scenario, NENO is extremely 
challenging — here, for each object/placing area pair, the 
algorithm is trained without either the object or the placing 
area. With the same algorithm, independent SVM with 
voting, Ro increases to 5.3. However, shared sparsity SVM 
(the last column in the table) helps to reduce the average 
Ro to 1.9. It is worth noting that in cases where the 
placing strategy is very different from the ones trained on, 
our algorithm does not perform well, e.g., Ro is 4.0 for 
the martini glass and stemware holder. This issue can be 
potentially addressed by expanding the training dataset. 

Note that for placing the objects in a designated placing 
area, our method relies on learning algorithms trained from 



TABLE III: Learning experiment statistics: The performance of different learning algorithms in different scenarios is shown. The 
first three double columns are the results for baselines, where no training data is used. Columns under 'independent SVM' are trained 
using separate classic SVM on each task, under four learning scenarios. The last double column is trained via shared sparsity SVM 
only for NENO. For the particular case of the martini glass and stemware holder, marked by statistics for SENO are not available 
because no other objects can be well placed in this environment. 

Listed environment-wise, averaged over the objects. 
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Fig. 6: Some screenshots of our robot placing different objects in several placing areas. In the examples above, the robot placed the 
objects in stable as well as preferred orientation, except for the top right image where a bowl is placed stably in upright orientation 
on rack-1. However, a more preferred orientation is upside down. 



data instead of relying on hard-coded rules. The assumption 
about the pre-defined object- specific grasping locations can 
be eliminated by other grasping algorithms, e.g., fl2l . We 
believe that this would enable our approach to extend to 
different and new placing scenarios. 



E. Robotic Experiments 

We conducted experiments on our robotic arm with the 



system as described in Section [III| For training, we used 
the same dataset as for the learning experiments in previous 
section. We performed a total of 100 trials. 

Table [IV] shows results for the objects being placed by 
the robotic arm in four placing scenarios: flat surface, rack- 



TABLE IV: Robotic performance test results, trained using 
shared sparsity SVM under different learning scenarios: SESO and 
NENO. Five experiments each were performed for each object- 
placing area pair. P s stands for 

-^stability and P p stands for 

-^preference • 
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1.0 


80 


80 


holder 
















Averag 




1.0 


98 


98 
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1, rack-3 and stemware holder. We see that our SESO case 
obtains a 98% success rate in placing the objects, which is 
quite significant and shows the performance of our overall 
system. It failed only in one experiment, when the bowl slid 
from the bump of the rack because of a small displacement. 

In our NENO case, we obtain 98% performance if 
we consider only stable placing, but 92% performance 
if we disqualify those stable placements that are not the 
preferred ones. The plate is such an object for which these 
two success-rates are quite different — since the learning 
algorithm has never seen the plate before (and in fact the 
placing area either), it often predicts a slanted or horizontal 
placement, which even though stable is not the preferred 
way to place a plate in a dish-rack. One failure case 
was caused by an error that occurred during grasping — 
the martini glass slipped a bit in the gripper and thus could 
not fit into the narrow stemware holder. 

Fig. [6] shows several screenshots of our robot placing 
objects. Some of the cases are quite tricky, for example 
placing the martini-glass hanging from the stemware holder. 
Videos of our robot placing these objects is available at: 
http://pr.es. Cornell . edu/placingob ject s 

VI. Conclusion and Discussion 

In this paper, we considered the problem of placing 
objects in various types of placing areas, especially the 
scenarios when the objects and the placing areas may 
not have been seen by the robot before. We first pre- 
sented features that contain information about stability and 
preferred placements. We then used a learning approach 
based on SVM with shared sparsity structure for predicting 
the top placements. In our extensive learning and robotic 
experiments, we show that different objects can be placed 
successfully in several environments, including flat sur- 
faces, several dish-racks, and hanging objects upside down 
on a stemware holder and a hook. 



There still remain several issues for placing objects that 
we have not addressed yet, such as considering the reach- 
ability of the placements or performing more complicated 
manipulations during placing. Furthermore, we also need 
to consider detecting the appropriate placing areas for an 
object, placing in cluttered scenes, and how to place mul- 
tiple objects in order for a personal robot to complete the 
tasks of interest such as arranging a disorganized kitchen 
or cleaning a cluttered room. 
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